Search CORE

5 research outputs found

Region-of-Interest Based Neural Video Compression

Author: Abati Davide
Cohen Taco S
Habibian Amirhossein
Perugachi-Diaz Yura
Sautière Guillaume
Yang Yang
Publication venue
Publication date: 02/11/2022
Field of study

Humans do not perceive all parts of a scene with the same resolution, but rather focus on few regions of interest (ROIs). Traditional Object-Based codecs take advantage of this biological intuition, and are capable of non-uniform allocation of bits in favor of salient regions, at the expense of increased distortion the remaining areas: such a strategy allows a boost in perceptual quality under low rate constraints. Recently, several neural codecs have been introduced for video compression, yet they operate uniformly over all spatial locations, lacking the capability of ROI-based processing. In this paper, we introduce two models for ROI-based neural video coding. First, we propose an implicit model that is fed with a binary ROI mask and it is trained by de-emphasizing the distortion of the background. Secondly, we design an explicit latent scaling method, that allows control over the quantization binwidth for different spatial regions of latent variables, conditioned on the ROI mask. By extensive experiments, we show that our methods outperform all our baselines in terms of Rate-Distortion (R-D) performance in the ROI. Moreover, they can generalize to different datasets and to any arbitrary ROI at inference time. Finally, they do not require expensive pixel-level annotations during training, as synthetic ROI masks can be used with little to no degradation in performance. To the best of our knowledge, our proposals are the first solutions that integrate ROI-based capabilities into neural video compression models.Comment: Updated arxiv version to the camera-ready version after acceptance at British Machine Vision Conference (BMVC) 202

arXiv.org e-Print Archive

Invertible DenseNets with Concatenated LipSwish

Author: Bhulai Sandjai
Perugachi-Diaz Yura
Tomczak Jakub M.
Publication venue
Publication date: 23/10/2021
Field of study

We introduce Invertible Dense Networks (i-DenseNets), a more parameter efficient extension of Residual Flows. The method relies on an analysis of the Lipschitz continuity of the concatenation in DenseNets, where we enforce invertibility of the network by satisfying the Lipschitz constant. Furthermore, we propose a learnable weighted concatenation, which not only improves the model performance but also indicates the importance of the concatenated weighted representation. Additionally, we introduce the Concatenated LipSwish as activation function, for which we show how to enforce the Lipschitz condition and which boosts performance. The new architecture, i-DenseNet, out-performs Residual Flow and other flow-based models on density estimation evaluated in bits per dimension, where we utilize an equal parameter budget. Moreover, we show that the proposed model out-performs Residual Flows when trained as a hybrid model where the model is both a generative and a discriminative model.Comment: Accepted at Neural Information Processing Systems (NeurIPS) 2021. This is an extension of Invertible DenseNets (arXiv:2010.02125). arXiv admin note: text overlap with arXiv:2010.0212

arXiv.org e-Print Archive

VU Research Portal

Deep learning for white cabbage seedling prediction

Author: Bhulai Sandjai
Perugachi-Diaz Yura
Tomczak Jakub M.
Publication venue: 'Elsevier BV'
Publication date: 01/05/2021
Field of study

In this study, the classification of white cabbage seedling images is modeled with convolutional neural networks. We focus on a dataset that tracks the seedling growth over a period of 14 days, where photos were taken at four specific moments. The dataset contains 13,200 individual seedlings with corresponding labels and was retrieved from Bejo, a company operating in agriculture. Different pre-trained convolutional neural network and multi-layer perceptron architectures are developed, along with a traditional statistical method, logistic regression. The models are trained to predict the (un) successful growth of the seedlings. We find that the convolutional neural networks outperform the other models, where AlexNet is the best performing model in this research. On the test set, AlexNet is able to classify 94% of the seedlings accurately with an area under the curve of 0.95. Accordingly, AlexNet is shown to be useful and robust in this particular classification task. AlexNet can be further deployed as an early warning tool to aid professionals in making important decisions. Additionally, this model can be further developed to automate the process

VU Research Portal

Invertible DenseNets with Concatenated LipSwish

Author: Beygelzimer Alina
Bhulai Sandjai
Dauphin Yann
Liang Percy S.
Perugachi-Diaz Yura
Ranzato Marc'Aurelio
Tomczak Jakub M.
Wortman Vaughan Jenn
Publication venue: Neural information processing systems foundation
Publication date: 01/12/2021
Field of study

We introduce Invertible Dense Networks (i-DenseNets), a more parameter efficient extension of Residual Flows. The method relies on an analysis of the Lipschitz continuity of the concatenation in DenseNets, where we enforce invertibility of the network by satisfying the Lipschitz constant. Furthermore, we propose a learnable weighted concatenation, which not only improves the model performance but also indicates the importance of the concatenated weighted representation. Additionally, we introduce the Concatenated LipSwish as activation function, for which we show how to enforce the Lipschitz condition and which boosts performance. The new architecture, i-DenseNet, out-performs Residual Flow and other flow-based models on density estimation evaluated in bits per dimension, where we utilize an equal parameter budget. Moreover, we show that the proposed model outperforms Residual Flows when trained as a hybrid model where the model is both a generative and a discriminative model